Caching, the secret behind it all

what exactly is caching to simplify it in one sentence caching is a mechanism using which we decrease the amount of time and effort it takes to perform some amount of work that is the on line explanation of what exactly is caching and a more technical definition of that would be caching is basically keeping a subset of some data let's say we have a primary data and when we keep a subset not the whole data a subset of that data

depending on the uses of the data the frequency of the uses and the probability of the next uses time etc etc depending on a lot of parameters when you keep a subset of some data in a location which is faster to access which takes less time and also it is easier which means it is it also takes less effort so technically speaking caching is a mechanism using which we can decrease the amount of time and the

effort it takes to retrieve or to do some kind of operation that is a technical definition of what we mean when we say caching and this single mechanism or this single technique is a huge factor in a lot of high performance applications so let's go through a couple of examples and understand where exactly people use caching and how does it exactly impact the performance of the application and how important it is and

to give you a hint it is very important at least when it comes to high performance applications which track latency in two digit micros or milliseconds okay let's take the example of Google pretty much all of us use or have used Google search in our browsers and what exactly happens when you search some query in Google let's say you type something in Google in the search bar and you hit and enter and that query is processed by Google search engine they have a pretty complex algorithm and workflow which every query goes through

that typically involves crawling indexing and ranking billions and billions of web pages and that whole process is pretty expensive and when we say expensive computer tionally expensive right it takes a lot of computing power a lot of CPU a lot of memory resources and etc etc and let's say a query like what is the weather today and queries like these are searched millions and millions of times every day and without caching without

implementing this mechanism called caching Google servers they would need to recompute all the results for every single query for every single query involving the current weather of that location and which will lead to a very high latency and of course a very high server load every single query would require going through all the index running all the ranking algorithms and fetching the results which would in turn significantly slow down the response STS

now when we insert caching in this so what Google does the search engine is it uses it a distributed in memory caching system to store we'll see what is in memory caching system soon but that's how Google does it uses a distributed in memory casing system distributed in the sense the servers of the casing system are spread across the whole world they're not just concentrated in a single location but they are spread across all over the world and those cach

servers they store the results whatever results are returned by all those ranking algorithms and all those different different algorithms that are involved in the whole uh Google search workflow they get cached or they get stored in the servers and when a user searches the system first checks whether the results of that particular query let's say you search the weather in some city of India so the system

checks whether the results of that is present in the cash or not if the query is cached that means it is a cash hit right that's that's what we call it's a technical term whenever we find the data that we are looking for the cash we call it a cash hit if the system finds it it Returns the results instantly right uh retrieving data from a cache is very fast that's the one of the reasons we use cash in the first place right and we'll see why exactly fetching data from cash is fast as compared to our

traditional databases like disk based databases right which is saw in the previous video okay we'll soon get into that well for now we are just going through some examples to understand the gravity or to understand the importance of caching in different different platforms and let's say the system does not find the result right the query that the user typed that has not been typed before so the system the cache it does not have the data so what it does it goes through the normal workflow all the ranking all the Sorting whatever

algorithms are involved and then it takes that result and it caches it so that the next time the same user of or some other user tries to type the same query then those results can be fetched from the cach and can be returned to the user okay that's one example taking another example let say another popular platform and we take Netflix how exactly does Netflix a huge and a global streaming platform which delivers

different different kinds of content like movies series and animes etc etc how does it deliver the content to millions of users all over the world and it steams large and when we say large it can be multiple terabytes because the way these streaming platforms work is for a single video let's say for a single movie it stores different different resolutions let's say we have a movie called movie one then it goes through a process called encoding and It prepares different

different resolutions for different different devices and different different network speeds so for a high level explanation let's say it has a 1080p uh resolution and another for 720p another for 480p etc etc right and depending on your network speed and depending on which device you're trying to access the content of Netflix it dynamically sends an optimiz version of that content for or that movie right so that you don't waste your bandwidth and

the load gets decreased on the Netflix server also okay that's all about the encoding part but talking about how does it exactly sense or delivers hundreds and thousands of terabytes of data to millions of users spread across all over the world right since Netflix is a global streaming platform and that too with minimal amount of buffering and and decreasing server load so that Netflix server don't crash and the way it does is it uses something called CDN and CDN

also known as content delivery Network the way they work is Netflix has its own servers its own originating servers let's say it's somewhere in the US right in us there is a there are different different data centers and different different locations where it has its own servers right and its own server racks which store the actual movies right here we have the actual movies but what it does it goes an extra mile and what it does we have our world here let's

imagine this is our whole world so at different different locations different different locations all over the world and those locations are of course chosen by the infrastructure that is available and how much network connectivity etc etc and a lot of parameters so all over the world there are different different locations and these are called Edge locations Edge locations Edge right these are called Edge locations because these servers are strategically placed

so that the latency of the users Connection in that region let's say there is a server in this part of the world then for this reason the latency the latency of data or whatever requests are served from the server is going to be minimal right let's say we have us somewhere here minimal as compared to all the requests going through the originating server which is situated in us and that part of the data center

serving request from all over the world then that will be a situation where all the people who are in us they will have low buffering time because geographically speaking they are closer to the servers that are serving the content so of course their uh lat is going to be very less as compared to people from the other side of the world right let's say from India and that is the reason this is a strategy that a lot

of companies use right not just Netflix CDN or content delivery networks are very common and a lot of companies and lot of small and big companies use them to efficiently deliver different different content it can be video files it can be images it can be also web web pages right web pages which consist of static asss like JavaScript HTML or css ET etc those kinds of assets are also served from different different CDN when

you use a platform like versel they also have an edge Network like this and their servers are also spread across all of the world so when you make a request to a server to a front end app which is deployed in versal so depending on your location that request is served from the closest region oral server to you instead of going to the originating server right so CDN are very common and this is how Netflix the global streaming

platform serves billions and billions of users with hundreds and thousands of datab sub data every single minute using this concept called caching the content at Edge locations right we again come to this concept called caching if you remember the definition of is called taking a subset of a primary storage and placing it in a way that accessing data from this source is much much faster

than accessing data from the primary storage so taking the example of Netflix Netflix takes some data a subset of the data of course it does not cache all the data of the Netflix storage in all the edge locations that would incur a lot of cost and of course a lot of resources is also it they use a lot of algorithms machine learning algorithms depending on what type of content people of that region are watching right they do a lot of trend analysis and lot of real time data Etc ET right a lot of computation

goes behind deciding what data what subset of the data to cach so that the amount of time and the effort can be decreased to access that data right you cannot just put everything in cash right that defeats the whole purpose and caching is also more expensive than storing the data in secondary storage or disk based storage and that is the reason we don't put everything in cash memory right dis storages are relatively cheaper and also

offer a lot of capacity now moving on to a third example let's take a platform called X or also previously known as Twitter okay and for this you can take any social media examples so this example can be applicable to Facebook also and you can say LinkedIn or YouTube videos etc etc right the same all these social media platforms implement the same kind of strategy for let's say if you're familiar with how Twitter the platform Works Twitter has a section

called trending topics right and what Twitter does or what x does is it identifies trending Topics by analyzing millions and millions of Tweets in real time it analyzes all the tweets that people are making all over the world and it extracts some kind of patterns and some kind of Trends and it calculates and that calculation is very expensive it takes a lot of uh machine learning based algorithms it takes a lot of gpus and a lot of infrastructure right and it

also involves processing a large amount of data terabytes and terabytes of data because we are talking about about analyzing tweets of people millions and billions of people all over the world so that is the reason to avoid doing this computation of this large amount of data and to use this much amount of resources to avoid doing that again and again because every time let's imagine if Twitter did all this calculation every time some user went to the trending section of the then of course the server

will crash in minutes or seconds because there are like billions of people and if even half of them are trying to access the trending section and every user triggers this calculation this expensive calculation of course the server cannot handle that and they will crash so what Twitter does of course to avoid doing this computation this heavy computation for each request it cashes it okay and by now you can see the pattern right every time there is a

situation where it's either about doing a lot of heavy computation or heavy amount of data when we have situations like this when we want to avoid doing a lot of computation again and again and when we want to avoid sending large amounts of data to a lot of users these are the two scenarios the common two scenarios when caching comes into play right so coming back to the Twitter example Twitter caches all these trending topics because for a trend to change in a particular region

right let's say you are in some country and that country has ongoing elections so elections being the trending topic in Twitter is not something which is subject to change in let's say seconds or minutes right at least it will stay in trending section for a couple of hours or a couple of days and that is the reason since it is not dynamically changed it is not subject to minutes and hours of change that that is the reason

it is very safe to cach the trending section of Twitter so what Twitter does it takes all this data all this millions and billions of tweets and every few minutes so we of course we cannot know what is the exact algorithm or what is the exact duration that Twitter does all this computation or what computations exactly Twitter does but taking a rough estimation for the sake of this example every few minutes every few minutes Twitter takes all these data from

different different regions and it executes different different algorithms okay on a very high level speaking different different machine learning algorithms and Trend detection algorithms and a lot of those complicated stuff and it stores them in a inmemory cash something like that is we'll see what are some famous and popular in memory cache Technologies but red is is one of them and it is very famous so let's imagine

calculates all these trending topics and it stores it in an inmemory key Value Store based database like redis and when the users request it then it takes that data from the cache instead of computing all of that again it takes that data from the cache and sends the user and that is the reason the moment you open your phone you get that data instantly you do not see any kind of delay right you do not see any kind of significant loading time of course if you have a very bad internet connection you'll see loaders and all but if you have a

generally fast internet connection it's pretty fast the whole UI interaction and with those three examples we kind of have a pattern here that every time we want to avoid doing some kind of heavy computation or some kind of heavy data when we want to avoid these two kinds of situation we resort to what we call as caching and now that we know caching is an important concept and it is is implemented everywhere so on a high level and since we want to limit the context to especially backend

engineering something all the concepts that you as a backend engineer are going to deal with on your day-to-day life or on your day-to-day development because of that we want to limit the context of caching because if you go around and start to find the use the implementation of caching in different different context starting from Hardware to Software to browsers etc etc it's going to take a lot of time and of course it is very good to have that knowledge but it is very difficult to cover all of

that in a single video so we'll keep it relevant on a high level when we talk about caching you will see three levels of caching at least you'll notice these three caching more frequently as compared to other kinds of caching as a backend engineer those three are first is Network level second hardware and third is software based it's not purely software based of course software based caching also uses Hardware based caching to provide the performance but it's

called software based caching because of the interaction and because of the the means of interaction you'll be dealing with some kind of Library some kind of software to interact with this cash that's why it's called software based caching but it's of course not purely software based caching it is it works with the hardware based cache okay now going into each of that let's start with network cache so inside Network cache also there are different different use cases a lot of use cases of caching but

the two major ones that you'll see frequently are the first one is CDN as I just explained in the Netflix example how Netflix serves and delivers all these heavy terabytes and terabytes of content to uses all of the world there we discuss CDN briefly so here we'll go a little more deep into CDN and the second thing is DNS DNS is also something we use a lot every day you

don't really notice the use but it is happening right and caching is also a very important part when it comes to DNS quaring so let's explore these two things which fall under Network level caches so first one is CDM also known as content delivery Network the whole idea of CDN or content delivery networks are to Cache content on servers that are geographically as I said geographically closer to the end users that's why they

are also called Edge nodes or Edge servers or Edge Computing any kind of edge word that you notice that basically means it is happening in a server that is closest to you instead of a single server or single servers in a particular region or the originating server right the CDN make use of this concept that the content will be stored they will be cached and served from the servers that are closest to the users to minimize the

amount of latency and minimize the amount of computation or the amount of resources of the originating server now on a high level this is how CDN work so let's imagine a user makes a request and it requests for some kind of resource a resource like an image or a video or a web page right some kind of resource by entering a URL in the browser okay

that's how we typically access some kind of data over the internet right through the URL of our browser and that browser that browser sends a DNS query to resolve the domain name whatever the domain name that we have typed of that image of that video or that web page the browser sends a DNS query to resolve that domain name into an IP address because that's the use case of DNS right DNS names take domain names and convert them into IP addresses so that our

browsers can access the actual location of the server right now next what happens the CDN domain name system the CDN DNS system routes that particular request to the nearest po po is also called as point of presence and it is just a fancy term it just basically means a particular region where there are multiple Edge servers these servers that we are seeing here there are multiple Edge servers in a particular region and that concentrated region is

called po or point of presence for that region right PO is just a collection of multiple Edge servers for a particular region now what happens the CDN DNS system it routes the request to the nearest po right the nearest region which has multiple Edge servers so that it can serve the content right so that is the responsibility of the DNS system because that's why DNS is a very important component when it comes to CDN and it routes the request depending on

multiple parameters parameters like users geographic location right and the network conditions what whether the user has a stable internet connection or it has a bad internet connection if it has a bad internet connection depending on that it might route to a different po or different Edge server so because this Edge server may only have the content of the highest quality let's say you have a banned internet connection and this particular po or point of presence this only has 1 1080 pixel formats of the

video right this is just a rough example so depending on your network condition the DNS system the the CDN DNS system it will check that since you have a batter internet connection it will route it to a pop which has videos of lesser qualities so that you can load that resource with minimum amount of buffering okay for example if a user in New York requests a particular resource an image or video or web page the CDN the CDN system will direct the request to your poop near New York right that's

how the requests are routed to the nearest po using DNS next what happens the Ed server once it reaches the PO which basically means a collection of Ed servers now here what happens it checks if the requested content that particular video that particular web page or image or whatever that is whether it is in the cach or not if if it is already cached or not depending on that it is either a cash hit or a cash M so as I already said if whatever we are looking for if

it's already in the cash then we call it a cash hit if it's not then it's called a cash Miss if it is a cashit it finds the resource and it just sends the request to the user right it is the happy path but if it is a cash Miss what happens the content is not in the cache so the edge server whatever Edge server that is handling this particular request it fetches it from the originating server let's say the originating server are somewhere in us right and the user is in India so it is the responsibility

of that as server to fetch it from the US servers the originating server where the actual resource is present of course the cash also has the copy of the resource it is not a fake resource but all the resources basically we are talking about originating servers have all the resources so that CDN can fetch from them and serve users the content and that's how typically a CDN a Content delivery Network Works using DNS and point of presence and originating servers and all these buzzwords CDs also

use a concept called TTL time to live right so depending on that they decide how long to keep a particular piece of content so usually companies decide some kind of fair duration right this content should only be cached for the next few hours because after that it might have a new version or there might be Regional changes etc etc right so it is a safe default so that after this amount of time if a new request comes then instead of serving the old content it fetches

from the originating server takes the fresh content and serves it with a new TTL time to right the way typical CDN workflow works and it is considered as a network level cache the Second Use case of caching when it comes to network layer is DNS DNS queries it they also make heavy uses of caching to minimize the latency of all the DNS queries that are happening from billions and billions of users at the same time so let's imagine another user scenario so that

it's easier to keep track of how DNS actually works and where exactly does it actually implement the caching workflow so let's imagine a user enters something in the URL let's say a domain name example.com right in their browser and they press enter what happens the first thing on we are just only focusing on the DNS part of course when you type something in your url bar a lot of things happen but we'll be focusing on just the DNS part right so the users

divides so after you hit enter the users device whatever the device it is the laptop or the smartphone it sends a DNS query and it sends it to a recursive resolver it's a technical term we'll see what it means it sends the DNS query to a recursive resolver and this resolver is typically provided by the users ISP ISP basically means whatever company it is you are taking your internet connection from so if you're from India it could be Gio it could be act or it could be AEL right so that is

your ISP now this resolver this recursive resolver this is provided by your ISP your internet service provided or it could also be provided by some third partyy DNS providers like Google has a public DNS provider or Cloud flare also as a public public DNS provider etc etc right so it can be either of that it can be your ISP or it can be a public DNS provider like Google or Cloud flare now second thing what happens the DNS query is routed to your recursive

resolver which can be in your ISP or a public DNS Server Like Google or Cloud flare it does not matter we'll see what exactly happens once the query reaches the recursive resolver the recursive resolver it does the following steps first thing what it does it first checks the local cache to see if it already has that particular IP right some random IP it checks once it has the query it checks whether it already has that particular IP cached in its local system

when you say local it means without accessing a remote storage or anything like that in its local uh primary storage or secondary storage dis based stage whatever right in its own system it checks whether it has the entry for this IP in its local cach or not depending on that either it is a cash hit or a cash Miss if it is a cash hit the resolver immediately Returns the result and what is the result in this case we the DNS query was to check what is the IP of example.com so it checks whether example.com is in its local cach

or not if it does then finds the corresponding IP address add and it returns it to the user to our browser or to our smartphone whatever it is right but if it is a Miss if it did not find it in its local cache it sends that query to other resols what are these other resols we proceed to the third step here right if it does not have the result if it if it does not have the corresponding IP address for that particular domain example.com it misses

it here it is a cash Miss and it queries other resolvers and it is called quering the root server okay there are a lot of root servers I think 13 or 14 I don't know the exact figure but these are called root servers okay there are multiple root servers spread all over the world and they don't necessarily have the particular IP then they don't have the result for what is the IP address of example.com but what they do have is they have the address of the top level domains right top level domains

like doco Doin etc etc they have the addresses or referrals for these top level domains or these top level domain servers okay we are talking about other servers now now since it is example.com it sends the address or the reference of the Doom top level server next you proceed it reaches the top level server for Doom okay the TLD TLD for the do top

level domain right and from there that also does not have the IP address of example.com of course then this sends the address for the authoritative called authoritative name server for [Music] example.com it has the address for the authoritative name server for example.com and it sends the address of that then what happens the recursive resolver right we are still talking about the recursive resolver which has the

responsibility of finding the IP address of this particular domain name and it is called recursive because it goes deep and deep and deep into different different servers until it finds the IP address that's why it's called a recursive resolver okay and at last we reach the actual authoritative name server for example.com and from there we can take the IP address whatever IP address it is can take that and it returns it where exactly caching comes into play here DNS heavily depends on caching because of how much of work it

is to recursively reach out to different different servers and finally get the IP that's the reason DNS heavily depends on caches so that it does not have to do all this work for every single request so that it has some kind of local cash which it can use for further request it only wants to do it once for a certain amount of time right so most operating systems it can be Windows or Mac or Linux maintain some kind of local cach for DNS okay and when a user request a particular domain the OS the operating

system checks its local cash first okay before it goes to your ISP the recursive resolver etc etc it checks its local cache if it can find it then it skips going to the query resolver the recursive resolver and immediately finds the IP address and goes to that and whatever the normal browser flow is then the next level if it does not find the entry for that particular domain then the next level of cach is browsers of

course browsers most of the modern web browsers like Chrome Firefox they maintain their own DNS cache okay after our operating system the next hierarchy the next level of caching is implemented by our browsers they have their own local cach or DNS so that if our operating system cannot provide the corresponding IP address browsers also try to maintain their own cash they check if they have the IP address of a particular domain from some previous request if it does

then they can skip going to the recursive query resolver and they can imediately access the website that's the next level the next level of cash is from the resolver the the recursive resolver that we talked about which recursively tries to find the IP address until it finds it in some server whether it is the root server the authoritative name server or whatever right that recursive resolver which is provided either provided by ISP or something like Google public DNS or Cloud flare right

these components they which are known as recursive resolvers they also maintain their own cash so that if for some request they can find the corresponding entry they can immediately return it so that they can skip going to all these different different servers to find the IP address and beyond that also when we talked about authoritative name servers they also have some kind of cache not all of them implemented but some of them do and using which they can also skip

going to skip making further request to further servers right so as you can see DNS also implements caching heavily in multiple layers next we have is the hardware level caching so if you are from a computer science background You' be already familiar with all these Concepts what is L1 cache what is L2 cache L3 cache what is main memory what is hard dis etc etc so caching is also implemented at Hardware level to make computations faster and to make a lot of CPU level operations faster so on a high level without going into too much

technical side of it you can see that we have here our CPU chips which do the actual processing then we have multiple layers of caches we have the L1 cach here and the L2 cache and we also have the L3 cache which is a memory unit which is shared between different different caches right this memory is shared between different different CPU units then we have the Ram or random access memory which is also called as the main memory and then we have the secondary storage hard disk and Network

CS Etc ET so what exactly happens CPU keeps these memory units also called as L1 cach L2 cach and L3 cache so that it can cast some of the repeated computations or some data which are frequently used or which are going to be used say to give an example if you are familiar with the concepts of arrays the reason arrays are a very good data structures for sequential accesses whenever you start traversing the elements of the array from a particular direction let's say you have 1 2 3 4 5

here the moment you start traving here CPUs or different different Computing units inside your processing units what they do through different predictive algorithms they take the entire memory unit and they put it in some cache right can be L one cach L2 cach depending on the predictive algorithm and that's the reason accessing elements sequencially in an array is a very fast operation because of predictive algorithms of the processing unit that is just one

examples of how harder level cach works the thing that we want to focus here is this part the random access memory or the main memory it's considered and whenever we talk about major caching Technologies say redis or M cached these are called inmemory key Value Store based databases and they are called inmemory databases because they store data in your main memory or the random access memory and

it is called Random Access Memory because compared to hard dis or ssds in which you can access the data in a mechanical way so if we take hard dis for example there is some kind of head which revolves around the dis and it finally finds the data it it is a mechanical operation and compared to that main memory or Random Access Memory it has a bunch of capacitors and all and through the use of electrical signals

and with the direct access of different different parts of the memory addresses through their addresses it can directly access the data with a single electrical signal and that is the reason it is called random access memory which means it does not matter from what direction you try to access the data the speed and the time is going to be the almost constant that's why it's called a random access memory and it is very fast to access data from this part of the memory but the problem is we don't have

random access memory or the main memory in abundance even though we have speed at our disposal when it comes to Random Access Memory they have limited capacity and also they are volatile in nature which means whenever you turn off your power whatever whatever data is stored in random access memory it goes away right it clears it start fresh whenever you start your computer right because of their volatile nature and because of the way they are created at least from the

harder level they are created with the tradeoffs that we are trading of nonvolatility and capacity for Speed right so that's the reason we cannot completely replace a hard disk or our traditional disk based St with random access memory right they have their use they are fast when it comes to data access and data retrieval but they are not a replacement for our secondary storage okay storing data in secondary storage is permanent permanent permanent

in the sense they are not volatile does not matter whether your program is accessing it or not whether your computer is on or not the data persists there because it is physically writing the data in the disk right that is the reason hard disk or second disc storage have their use and primary storages have their own use primary storage have very fast data access time and secondary storages have slower data access but they provide the safety of persistence

and that is the reason Technologies like redis or mcast these databases they make use of this random access memory or the main memory to store their data and behind the scenes for persistence they make use of the secondary storage so with some kind of mechanism when the program starts off they take the data from the secondary storage and load it in the main memory again so that you have your data persistence but when you retrieve it you retrieve it from the primary memory right is the responsibilities of these programs that is M cach or whatever

caching technology the in memory caching technology that you are using is there responsibility to implement persistence Etc with the secondary storage but when we are talking about accessing the data whether retrieving it or modifying it it happens with the primary memory right that's the reason data access operations from these databases are very fast and finally coming back to the context of backend development Technologies like redis or mcast or if

we are talking about Cloud Technologies then Technologies like AWS elastic cache these come into play they provide some kind of storage and that storage is based on the primary memory and that's the reason data access operations are very fast we call these Technologies in memory key value based no SQL databases right there are four components here the first thing we already saw they are called inmemory based databases because they as compared to traditional databases like Po gra or

MySQL they are not stored in diss right second the storages they are stored in Ram or primary storage that's the reason they are very fast right that's why they are called inmemory databases the second thing is key value based as compared to traditional databases relational databases which have very strict schema and you have to create tables and you have to create rows etc etc right instead of that here the data structures are pretty simple you have keys and you

have values you have a particular key and for that key you can store anything it can be a list it can be a Json it can be a string number ET different different Technologies offer different different data types so we won't go deep into that you can do your own research for that and that's the reason they are called key value based no SQL databases right they don't enforce the strictness of traditional SQL databases and that's about the name now since we are talking about databases like this which we use

in our day-to-day backend development there are two types of caching strategies caching strategies the first one is lazy caching or also known as cash aside and the second one is known as write through and surprisingly we have already talked about this so we won't go very deep into this all in all lazy caching basically means let's say there is a client and it is requesting some resource so that server checks whether that resource is present in the cash not

if it is then it returns that resource if it is not it fetches it from a different storage right some kind of primary storage and it fetches it it gets the data it stores it in the cash and then it Returns the result to the client so that the next time the client makes that request or some other client makes that request it again checks and it finds that in the cash and it returns it that's why they are called lazy caching you don't proactively cash things by predicting from client interaction or patents etc etc you just

cash it when someone actually requests it that's why they're called lazy caching and it is a pattern that is very famous in a lot of different different context in backend development the second one is called Write through caching it is more about the cash updation strategy which basically means every time something changes right every time there is a post call or put call or patch call where you are actually changing some resource or creating some resource in the database in your primary storage what you do you make the change for that resource in your database you

make the data as query at the same time in the same API call in the same execution flow you also make the same change in your cache so that the next time the request comes you can serve it from the cash and which also means all your operations all your right operations bring more overhead because at the same time you have to update the database and the cache at the same time time but the advantage here is your cash is always fresh right it is never expired you never serve old data since

you update the data in the database and the cash at the same time your cash is always fresh so those are two major strategies that you will see frequently in your day-to-day development the next major component when we are talking about Technologies like redis or MC is the eviction policy something else that we should be aware of is the eviction policy and what it means is when you have a cash as we already know that in

memory caches like redis they use primary storage or the random access memory which is limited in capacity as compared to the secondary storage so it is pretty obvious that at one point we'll run out of memory or if we are using a manage storage like a elastic cash right so they have a storage limit right in these conditions we are bound to run out of memory at some point and at that point we have to decide that we want to delete data from our cache right of course from the initial part of the

video we have already discussed that cash is only a subset of the data which is frequently accessed which is store in a different location so that the retrieval is fast right so the keyword to focus here is a subset of the primary storage right we cannot store all of the primary storage to cash and that is the reason we have to decide that at some point we have to let go of some data so that the new data can be stored new data which are more important which have higher priority than whatever the old dat ties and that's why we have the

eviction policies eviction policies basically means you decide how are you getting rid of the old data so that new data can be stored in the cach so a couple of strategies are the first one is no eviction and this basically means you do not configure any eviction policy the next time you try to insert data into the cache you get some error that the memory is full right does not make any sense but if you have not configured any eviction policies then the configuration is called No eviction the second one is lru also known as least

recently used this algorithm basically checks that which pieces of data are least recently used basically means if there are other data points let's say we have 1 2 3 4 four keys in the cache and now the memory is full right and the database or something like that is they keep track of when was the last time a particular key was accessed let's say this key was accessed today and this is also accessed today this is also accessed today but this was accessed

yesterday now when a new data point comes across let's say we want to store Key five when five comes across and this cash is full this has to make space for five and in this situation if we have configured least recently used eviction policy what it does it checks which is the data point which is the oldest accessed data point and since four was accessed yesterday and all the other ones were accessed today only it selects this that you will go out and five will go in this is what we mean when we say least recently used or lru cash eviction policies third we also have lfu which

means least frequently used so going back to the earlier example forget over the time let's say okay we have four Keys 1 2 3 4 and five comes across now instead of tracking when was the last time this key was accessed it also tracks what is the frequency of the axis let's say one was accessed five times and we are talking about so far right one the key number one was accessed five times so far and two has been accessed

10 time then we have three six and four was accessed let's say 23 times right some random numbers now when five comes across it checks which is the data point is least frequently used which means the access frequency is the least so if you check this one the key one was was accessed the least number of times right the five the other time the other keys are being accessed more frequently and that is the reason it it selects one and says that you will go out and five will go in that's what we mean when we say

least frequently used cash eviction policy then we also have TTL time to live based eviction policy which basically means in redis we can configure what is the TDL time to live for different different keys right so depending on that a particular key can be invalidated automatically depending on the TTL time to okay and that can also be part of the eviction policy so in the same scenario when five comes in it checks that which of these keys have

the lowest amount of TTL so which key is going to be expired soon right depending on that it selects that key and takes it out and inserts the new key and that's what we mean when we say time to leave TTL based eviction policy and that's pretty much all you need to know when it comes to the technicality of in memory caches like red and all rest of it is pretty straightforward you take whatever compatible library is available in your corresponding programming language let's say nodejs has node redis or something

something so depending on that you just use the library it's not very complex you provide a key you provide a value you store it and you when you want to retrieve it you provide the key and you get the value right it is pretty straightforward it has no complexities like like SQL queries and aggregation etc etc it's pretty simple to access right but all this technical familiarity with how the technology Works behind the scenes what are the major components it helps you make sense of the whole thing

and make better decisions etc etc so now that we have discussed all the technicalities of inmemory databases like red is mcast let's see some examples what are some use cases where do we actually use Technologies like redis the in memory databases and a typical backend engineering workflow and one of the major use cases is database DB query so let's take an example that you have an SQL query which has a lot of

joints and it tries to join a lot of tables it does a lot of aggregation right and finally it ends up with a few rows it is a very compute intensive operation because you have a large data set let's say millions and millions of rows and it is a very compute intensive data set and you have through monitoring you have noticed that this particular API which calls this particular database query it is hit pretty frequently right it is let's say in your landing page or

in your dashboard page and a lot of users are hitting this API and it is putting a lot of load in your database so what you can do you can take that particular query in the next line what you can do you can cach the result with some detail time to live you can say that for the next 1 hour cash this result and when the next request comes check if the result is present in the cach then serve it from there otherwise do the calculation once and store it in the cache and whenever some other modification happens you can manually

change the cache or you can delete it or you can handle it accordingly right then whenever users hit that particular API instead of putting the load on the DB you can directly serve that particular uh result of the query from a cash right that is a good example right whenever you have a very compute intensive database operation and it is it is being called pretty frequently then you can cach that result to save some Computing time save some resource and to reduce the latency of the API call also to take

some real world examples e-commerce websites like Amazon they also cast product details prices and inventory data etc etc to avoid querying the database for every single request so imagine there is some sale going on for a Macbook and all and if Amazon did not cash the details of a Macbook for that particular product then during the period of the sale millions of users hit that particular web page and the

fetching the image of the MacBook or fetching all the product description of the MacBook these are all database operations right and the database will get a million requests and that puts significant load on the database and for no reason because a information like product details for a Macbook it does not change very often right so it is a very good candidate for caching so that's the reason websites like Amazon platforms like Amazon they cash static data like product details prices in some

cases so that they can reduce the load in the database and the database can actually to do the important stuff instead of serving the static content like product details the same way social media platforms say like Twitter or Facebook they cach user profile data user profile data is not something that changes very often right maybe a couple of times in a year right that's the reason they serve they cach the user profile data so that every time that data is fetched that is served from the

cash instead of from the database imagine if it is a social media profile of some celebrity in that case that particular page and that particular API of patching the user profile details of that celebrity that might get hit let's say a thousand times per day right or if they have an upcoming movie maybe a million times a day in that case putting all that load on the database does not make much of a sense since that user profile information is pretty static most of the times it can serve that content from cash right and even if the

user makes some change they can invalidate the cach and put the new entry but it is a pretty read heavy operation as you can see from the pattern and whenever we have a read heavy operation and the right is pretty infrequent we can make use of caching okay so database caching database query caching is one of the primary examples of when we use Technologies like redis or in memory databases the second one is session storing session so if you have

watched the authentication video in this playlist you might be aware of this in a typical authentication flow after a successful authentication a session token is generated for that particular user and that session token is stored in some kind of storage right and ideally it is stored in redis or a inmemory database so that every time the user makes a request or an API call we have to fetch that session information from somewhere right if did not use redis

we'll have to fetch that from our database and as you already know fetching data from random access memory or primary memory or in this case cash like redis is much much faster than fetching data from a database and that's the reason in order to avoid latency in every single API and to avoid putting load on our database people usually store session tokens authentication related session tokens in redis or any in memory databases right that is another use case is of redis third is

API caching so let's say in your back end you are making use of some external API let's say some weather AP and you are taking the information from that and you are doing some kind of computation to serve your own front end now every time your frontend makes a request to your API if you do not make use of caching you also make another request to the weather API to fetch the weather data and if you have a lot of users and they are making multiple API calls to here you also end up making a lot of and

thousands of API calls to this external API and usually since external API calls have some kind of limit some kind of rate limit or some kind of pricing based on how frequently are calling them you will quickly increase your billing or you'll hit your rate limit and since in this case it is not realtime data right it is a weather data which is safe to Cache weather data does not not change in like seconds or minutes that's why it is a kind of data that is safe to Cache

so what you do you fetch that information from that API you cash it and you make use of that in all the subsequent request that is coming from your frontend and you can use something like TTL for let's say of an hour so you cach this data for an hour and for the next 1 hour all the requests will use the value of the weather data from the cache and after an hour the cash automatically invalidates it and the next time a request comes instead of taking it from cash you'll fetch the weather data again you you'll put it in

the cash and then you return the request right whenever it comes to interacting with external API we think about caching so that we can decrease the billing or we can avoid hitting the rate limit and one last use case that comes to mind is since we are talking about rate limiting rate limiting mechanism is also implemented most of the times using a technology like redis or any inmemory caches because the way rate limiting is implemented is it is usually some kind

of middleware which sits somewhere in the middle since it is called a middle that's the reason before the request is passed to your whatever route or controller or whatever it is and this is the browser or any client that is making the request and this is this is the entry point to your particular server right and this is the middleware the rate limit middleware and what it does is it takes the header some kind of header which gives it the IP address of the user usually the header is something

like X forwarded for right this header is mostly used for implementing rate limiting to find out the IP address the public IP address of your client wherever the request is coming from and this is usually added by reverse process like X or whatever that you are using that is the point the point is the job of this middleware the rate limit middleware is it takes this header out from that request every request has some default headers Associated you already

know then it takes this particular header out from that request and it checks the IP address and usually rate limits have some kind of default mechanisms of durations like how many requests let's say you can only make 50 requests per second right and these are mostly implemented to avoid uses from Bots let's say if it's a very compute intensive API then rate limiting can help you uh save your resource right

save your compute let's say this is the condition you a particular client can only do 50 request in 1 minute let's say 1 minute then whenever a request comes it checks that whatever value it is in the X forwarded for header let's say it find some IP 10 12 some kind of Ip since we are talking about a key Value Store okay it saves some kind of counter say that this is the IP address for this IP

address for the start of this minute the first request came I got one request the next time the request comes it checks the cash so far one minute has not passed right that's why it checks and it increases the counter from one it makes two same way 3 4 every time a request comes it increases the counter so that within a minute if the counter exceeds 50 then what it does it blocks that request and it sends an error response of 4 to9 and

when we send a status code of 4 to9 it means too many requests too many requests this is how a particular workflow looks like for rate limiting now coming into caching this particular part where it stores the counter is usually in redice or any inmemory databases because in if it stored it in a relational database something like pogress or MySQL that is possible of course that is possible it has a persistent storage and we can

retrieve data etc etc but difference is taking data out from a relational database takes more time right even a difference of let's say 20 or 30 milliseconds that makes a difference on what is the latency of the API if we stored it in a relational database then for each request we'll be making a database call in turn the first thing is anyway the latency will be increased for that particular API since we are making a database call unnecessarily for each

request second thing is the load on our database also increases let's say there are like a th000 users then, users are making 100 request per a minute then our database will be flooded with just the request of rate limiting right that is the reason we want to separate this out for two reasons first is of course to make it as fast as possible so that we can minimize the latency of apis and also to decrease the database load and

that is the reason whenever we are talking about implementing rate limiting we make use of inmemory databases like redis instead of storing it in our relational databases and with that we have pretty much covered what is caching why do we use it some real world examples and why do we use inmemory caches why is it called inmemory cache why Ram primary storage is faster and some real world examples of redis some different components of redis ET ET and that pretty much covers all you need to know to be comfortable with caching to

understand caching to understand the context where it is used and what are the different components of it and why do we use it some real we examples and next you can use all these Technologies you can start with red is the open source alternative V key right and you can depending on your programming language you can install a library and you can get your hand study and see what are the differences in accessing data from a relational database as compared to something like in memory database like redis